Search CORE

208 research outputs found

Structured, sparse regression with application to HIV drug resistance

Author: Percival Daniel
Roeder Kathryn
Rosenfeld Roni
Wasserman Larry
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/08/2011
Field of study

We introduce a new version of forward stepwise regression. Our modification finds solutions to regression problems where the selected predictors appear in a structured pattern, with respect to a predefined distance measure over the candidate predictors. Our method is motivated by the problem of predicting HIV-1 drug resistance from protein sequences. We find that our method improves the interpretability of drug resistance while producing comparable predictive accuracy to standard methods. We also demonstrate our method in a simulation study and present some theoretical results and connections.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS428 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Correcting for heterogeneity in real-time epidemiological indicators

Author: Rosenfeld Roni
Rumack Aaron
Townes F. William
Publication venue
Publication date: 28/09/2023
Field of study

Auxiliary data sources have become increasingly important in epidemiological surveillance, as they are often available at a finer spatial and temporal resolution, larger coverage, and lower latency than traditional surveillance signals. We describe the problem of spatial and temporal heterogeneity in these signals derived from these data sources, where spatial and/or temporal biases are present. We present a method to use a ``guiding'' signal to correct for these biases and produce a more reliable signal that can be used for modeling and forecasting. The method assumes that the heterogeneity can be approximated by a low-rank matrix and that the temporal heterogeneity is smooth over time. We also present a hyperparameter selection algorithm to choose the parameters representing the matrix rank and degree of temporal smoothness of the corrections. In the absence of ground truth, we use maps and plots to argue that this method does indeed reduce heterogeneity. Reducing heterogeneity from auxiliary data sources greatly increases their utility in modeling and forecasting epidemics

arXiv.org e-Print Archive

Computationally Assisted Quality Control for Public Health Data Streams

Author: Joshi Ananya
Mazaitis Kathryn
Rosenfeld Roni
Wilder Bryan
Publication venue
Publication date: 29/06/2023
Field of study

Irregularities in public health data streams (like COVID-19 Cases) hamper data-driven decision-making for public health stakeholders. A real-time, computer-generated list of the most important, outlying data points from thousands of daily-updated public health data streams could assist an expert reviewer in identifying these irregularities. However, existing outlier detection frameworks perform poorly on this task because they do not account for the data volume or for the statistical properties of public health streams. Accordingly, we developed FlaSH (Flagging Streams in public Health), a practical outlier detection framework for public health data users that uses simple, scalable models to capture these statistical properties explicitly. In an experiment where human experts evaluate FlaSH and existing methods (including deep learning approaches), FlaSH scales to the data volume of this task, matches or exceeds these other methods in mean accuracy, and identifies the outlier points that users empirically rate as more helpful. Based on these results, FlaSH has been deployed on data streams used by public health stakeholders.Comment: https://github.com/cmu-delphi/covidcast-indicators/tree/main/_delphi_utils_python/delphi_utils/flash_eva

arXiv.org e-Print Archive

Flexible Modeling of Epidemics with an Empirical Bayes Framework

Author: Brooks Logan C.
Farrow David C.
Hyun Sangwon
Rosenfeld Roni
Tibshirani Ryan J.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/10/2014
Field of study

Seasonal influenza epidemics cause consistent, considerable, widespread loss annually in terms of economic burden, morbidity, and mortality. With access to accurate and reliable forecasts of a current or upcoming influenza epidemic's behavior, policy makers can design and implement more effective countermeasures. We developed a framework for in-season forecasts of epidemics using a semiparametric Empirical Bayes framework, and applied it to predict the weekly percentage of outpatient doctors visits for influenza-like illness, as well as the season onset, duration, peak time, and peak height, with and without additional data from Google Flu Trends, as part of the CDC's 2013--2014 "Predict the Influenza Season Challenge". Previous work on epidemic modeling has focused on developing mechanistic models of disease behavior and applying time series tools to explain historical data. However, these models may not accurately capture the range of possible behaviors that we may see in the future. Our approach instead produces possibilities for the epidemic curve of the season of interest using modified versions of data from previous seasons, allowing for reasonable variations in the timing, pace, and intensity of the seasonal epidemics, as well as noise in observations. Since the framework does not make strict domain-specific assumptions, it can easily be applied to other diseases as well. Another important advantage of this method is that it produces a complete posterior distribution for any desired forecasting target, rather than mere point predictions. We report prospective influenza-like-illness forecasts that were made for the 2013--2014 U.S. influenza season, and compare the framework's cross-validated prediction error on historical data to that of a variety of simpler baseline predictors.Comment: 52 page

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Stream ecosystem responses to an extreme rainfall event across multiple catchments in southeast Alaska

Author: Anderson
Barnes
Brown
Carrivick
Chapin
Collier
Death
Death
Death
Dole-Olivier
Engstrom
Fastie
Fritz
Gendron
Germer
Giere
Gjerløv
Hammer
Harvey
IPCC
Klaar
Kroon
Lake
Matthews
Matthews
McCabe
McMullen
Melo
Merritt
Mesa
Milly
Milner
Milner
Milner
Niemi
Olden
Olsen
Palmer
Poff
Robertson
Robertson
Robertson
Roni
Rosenfeld
Schoonover
Sipos
Smith
Stanley
Stover
Sueyoshi
Vieira
Yount
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

Floods are a key component of the flow regime of many rivers and a major structuring force of stream communities. Climate change is predicted to increase the frequency of extreme rainfall (i.e. return intervals > 100 years) leading to extensive flooding, but the ecological effects of such events are not well understood. Comparative studies of flood impacts are scarce, despite the clear need to understand the potentially contingent responses of multiple independent stream systems to extreme weather occurring at meso- and synoptic spatial scales. We describe the effect of an extreme rainfall event affecting an area >100,000 km2 that caused extensive flooding in SE Alaska. Responses of channel morphology and three key biological groups (meiofauna, macroinvertebrates and fish) were assessed in four separate and recently deglaciated stream catchments of contrasting age (38-180 years) by comparing samples taken before and after the event. Ecological responses to the rainfall and subsequent flooding differed markedly across the four catchments in response to variations in rainfall intensity and to factors such as channel morphology, stream sediment composition and catchment vegetation type and cover, which were themselves related to stream age. Our study demonstrates the value of considering multiple response variables when assessing the effects of extreme events, and highlights the potential for contrasting biological responses to extreme events across catchments. We advocate more comparative studies to understand how extreme rainfall and flooding affects ecosystem responses across multiple catchments

Roehampton University Research Repository

Crossref

University of Birmingham Research Portal

White Rose Research Online

A probabilistic generative model for GO enrichment analysis

Author: Alexa
Bader
Bar-Joseph
Cheung
Davis
Deutscher
Eisen
Ernst
Ernst
Ewing
Gasch
Gerard J. Nau
Giot
Grassme
Grossmann
Harbison
Ihmels
Itamar Simon
Jones
Kellis
Leem
Mewes
Mukherjee
Nasmyth
Natarajan
Nau
Navarre
Palomero
Park
Ren
Rojas
Roni Rosenfeld
Spellman
The ENCODE Project Consortium.
The Gene Ontology Consortium.
The Toxicogenomics Research Consortium.
Thomas
Yong Lu
Ziv Bar-Joseph
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The Gene Ontology (GO) is extensively used to analyze all types of high-throughput experiments. However, researchers still face several challenges when using GO and other functional annotation databases. One problem is the large number of multiple hypotheses that are being tested for each study. In addition, categories often overlap with both direct parents/descendents and other distant categories in the hierarchical structure. This makes it hard to determine if the identified significant categories represent different functional outcomes or rather a redundant view of the same biological processes. To overcome these problems we developed a generative probabilistic model which identifies a (small) subset of categories that, together, explain the selected gene set. Our model accommodates noise and errors in the selected gene set and GO. Using controlled GO data our method correctly recovered most of the selected categories, leading to dramatic improvements over current methods for GO analysis. When used with microarray expression data and ChIP-chip data from yeast and human our method was able to correctly identify both general and specific enriched categories which were overlooked by other methods

Crossref

PubMed Central

Results from the centers for disease control and prevention's predict the 2013-2014 Influenza Season Challenge

Author: Allen Christopher
Alper David
Aman Susan
Anil Kumar V. S.
Aslam Anoshã
Bakach Iurii
Barrett Chris
BASAGNI Stefano
Biggerstaff Matthew
Bisset Keith
Broniatowski David
Brooks Logan
Brownstein John S.
Butler Patrick
Chakraborty Prithwish
Chandra Priyadarshini
Chen Jiangzhuo
Del Valle Sara Y.
Deshpande Alina
Dredze Mark
Eggo Rosalind
Eubank Stephen
Fairchild Geoffrey
Farrow David
Finelli Lyn
Fox Spencer
Fung Isaac Chun Hai
Gambhir Manoj
Generous Nicholas
GESUALDO Francesco
Goldstein Ed
Hao Yi
Henderson Jette
Hickman Kyle S.
Hickmann Kyle S.
Hyman James M.
Hyun Sangwon
Karspeck Alicia
Kaup Hemchandra
Khadivi Pejman
Krishnan Ramesh
Laskowski Kathy
Lewis Bryan
Lipsitch Marc
Lum Kristian
Madhavan Satish
Marathe Madhav
Markar Ashirwad
Mekaru Sumiko R.
Meyers Lauren Ancel
Nagel Anna
Nsoesie Elaine O.
Pashley Bryanne
Paul Michael
PERRA NICOLA
Priedhorsky Reid
Ramakrishnan Anurekha
Ramakrishnan Naren
Rosenfeld Roni
Scarpino Sam
Schaible Braydon J.
Scott James
Sexton Jessica K.
Shaman Jeffrey
Singh Bismark
Srinivasan Ravi
STILO GIOVANNI
Tibshirani Ryan J.
Tozzi Alberto E.
Tse Zion Tsz Ho
Tsou Ming Hsiang
VELARDI Paola
Vespignani Alessandro
Yang Wan
Ying Yuchen
Zhang Qian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Early insights into the timing of the start, peak, and intensity of the influenza season could be useful in planning influenza prevention and control activities. To encourage development and innovation in influenza forecasting, the Centers for Disease Control and Prevention (CDC) organized a challenge to predict the 2013-14 Unites States influenza season. Methods: Challenge contestants were asked to forecast the start, peak, and intensity of the 2013-2014 influenza season at the national level and at any or all Health and Human Services (HHS) region level(s). The challenge ran from December 1, 2013-March 27, 2014; contestants were required to submit 9 biweekly forecasts at the national level to be eligible. The selection of the winner was based on expert evaluation of the methodology used to make the prediction and the accuracy of the prediction as judged against the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). Results: Nine teams submitted 13 forecasts for all required milestones. The first forecast was due on December 2, 2013; 3/13 forecasts received correctly predicted the start of the influenza season within one week, 1/13 predicted the peak within 1 week, 3/13 predicted the peak ILINet percentage within 1 %, and 4/13 predicted the season duration within 1 week. For the prediction due on December 19, 2013, the number of forecasts that correctly forecasted the peak week increased to 2/13, the peak percentage to 6/13, and the duration of the season to 6/13. As the season progressed, the forecasts became more stable and were closer to the season milestones. Conclusion: Forecasting has become technically feasible, but further efforts are needed to improve forecast accuracy so that policy makers can reliably use these predictions. CDC and challenge contestants plan to build upon the methods developed during this contest to improve the accuracy of influenza forecasts. © 2016 The Author(s)

Archivio della ricerca- Università di Roma La Sapienza

Using data-driven rules to predict mortality in severe community acquired pneumonia

Author: A Ortqvist
AD Sevin
C Kooperberg
C Kooperberg
Chuang Wu
D Wang
DC Angus
DC Angus
DM Agnese
E Triantaphyllou
EA Panacek
EB Keeler
EE Vasilevskis
F Stuber
G Clermont
G Clermont
G Clermont
G DiRienzo1
Gilles Clermont
GR Bernard
J Daley
J Kasal
JA Kellum
JE Zimmerman
JL Vincent
Jorge I. F. Salluh
JR Quinlan
K Wang
M Feldmann
M Fine
MJ Fine
MJ Fine
MJ Fine
MM Levy
MT Keegan
N Beerenwinkel
N Beerenwinkel
NY Kurashi
P Clark
RC Read
RG Wunderink
Roni Rosenfeld
RP Dellinger
S Draghici
S Lemeshow
S Leteurtre
SE Poynter
SN Sanchez
SY Rhee
TJ Marrie
U Ruckert
V Kaplan
VTC Silva
Y Sakr
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Prediction of patient-centered outcomes in hospitals is useful for performance benchmarking, resource allocation, and guidance regarding active treatment and withdrawal of care. Yet, their use by clinicians is limited by the complexity of available tools and amount of data required. We propose to use Disjunctive Normal Forms as a novel approach to predict hospital and 90-day mortality from instance-based patient data, comprising demographic, genetic, and physiologic information in a large cohort of patients admitted with severe community acquired pneumonia. We develop two algorithms to efficiently learn Disjunctive Normal Forms, which yield easy-to-interpret rules that explicitly map data to the outcome of interest. Disjunctive Normal Forms achieve higher prediction performance quality compared to a set of state-of-the-art machine learning models, and unveils insights unavailable with standard methods. Disjunctive Normal Forms constitute an intuitive set of prediction rules that could be easily implemented to predict outcomes and guide criteria-based clinical decision making and clinical trial execution, and thus of greater practical usefulness than currently available prediction tools. The Java implementation of the tool JavaDNF will be publicly available. © 2014 Wu et al

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

FigShare